The TV News Archive's Third Eye project captures the chyrons–or narrative text–that appear on the lower third of TV news screens and turns them into downloadable data and a Twitter feed for research, journalism, online tools, and other projects. At project launch (September 2017) we are collecting chyrons from BBC News, CNN, Fox News, and MSNBC–more than four million collected over just two weeks. Chyrons have public value because:
Breaking news often appears on chyrons before TV newscasters begin reporting or video is available, whether it's a hurricane or a breaking political story.
Which chyrons a TV news network chooses to display can reveal editorial decisions that can inform public understanding of how news is filtered for different audiences.
Providing chyrons as data–and also on Twitter–in near real-time can serve as a alert system, showing how TV news stations are reporting the news. Often the chyrons are ahead of the general conversation on Twitter.
Both raw and filtered data feeds are available for download at:
The work of the Internet Archive's TV architect Tracey Jaquith, the Third Eye project applies OCR to the "lower thirds" of TV cable news screens to capture the text that appears there. The chyrons are not captions, which provide the text for what people are saying on screen, but rather are text narrative that accompanies news broadcasts.
Created in real-time by TV news editors, chyrons sometimes include misspellings. The OCR process also frequently adds another element where text is not rendered correctly, leading to entries that may be garbled. To make sense out of the noise, Jaquith applies algorithms that choose the most representative chyrons from each channel collected over 60-second increments. This cleaned-up feed is what fuels the Twitter bots that post which chyrons are appearing on TV news screens.
We provide options to download this filtered feed, or the raw content as it appears or the raw content nearly as soon as it appears on the TV screen. Both may be useful depending on the type of project. In addition, the Twitter feed itself is a good source to see what the filtered feed looks like.
Some notes on the data
Chryons are derived in near real-time from the TV News Archive's collection of TV news. The constantly updating public collection contains 1.4 million TV news shows, some dating back to 2009.
At launch, Third Eye captures four TV cable news channels: BBC News, CNN, Fox News, and MSNBC.
Data can be affected by temporary collection outages, which typically can last minutes or hours, but rarely more. If you are concerned about a specific time gap in a feed and would like to know if it's the result of an outage, please inquire at firstname.lastname@example.org.
The "raw feed" option provides all of the OCR'ed text from chryons at the rate of approximately one entry per second. The "filtered tweets" download provides the data feed that fuels our Twitter bots; this has been filtered to find the most representative, clearest chyrons from a 60-second period, with no more than one entry/tweet per minute (though the duration may be shorter than 60 seconds.) The filtered feed relies on algorithms that are a work in progress; we invite you to share your ideas on how to effectively filter the noise from the raw data.
Dates/times are in UTC (Coordinated Universal Time).
Because the size of the raw data is so large (about 20 megabytes per day), we limit results to seven days per request.
We began collecting raw data on August 25, 2017; the filtered feed begins on September 7, 2017.
"Duration" column is in seconds–the amount of time that particular chyron appeared on the screen.
To open a TSV file in a program such as Google sheets or Excel, you'll need to download the text file, click "save as" and save as a file on your computer. Then you can use the import function to pull the text file into your program, using "tab" as your delimiter.
We want to hear from you! Please contact us with questions, feedback, concerns – and also to tell us what project you've done with the TV News Archive's Third Eye project: email@example.com. Follow us @tvnewsarchive, and subscribe to our weekly newsletter here.